Word Alignment without NULL Words

نویسندگان

Philip Schulz

Wilker Aziz

Khalil Sima'an

چکیده

In word alignment certain source words are only needed for fluency reasons and do not have a translation on the target side. Most word alignment models assume a target NULL word from which they generate these untranslatable source words. Hypothesising a target NULL word is not without problems, however. For example, because this NULL word has a position, it interferes with the distribution over alignment jumps. We present a word alignment model that accounts for untranslatable source words by generating them from preceding source words. It thereby removes the need for a target NULL word and only models alignments between word pairs that are actually observed in the data. Translation experiments on English paired with Czech, German, French and Japanese show that the model outperforms its traditional IBM counterparts in terms of BLEU score.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Collocation-Based Bayesian HMM Word Alignment

We present a new Bayesian HMM word alignment model for statistical machine translation. The model is a mixture of an alignment model and a language model. The alignment component is a Bayesian extension of the standard HMM. The language model component is responsible for the generation of words needed for source fluency reasons from source language context. This allows for untranslatable source...

متن کامل

Improving IBM Word Alignment Model 1

We investigate a number of simple methods for improving the word-alignment accuracy of IBM Model 1. We demonstrate reduction in alignment error rate of approximately 30% resulting from (1) giving extra weight to the probability of alignment to the null word, (2) smoothing probability estimates for rare words, and (3) using a simple heuristic estimation method to initialize, or replace, EM train...

متن کامل

ProAlign: Shared Task System Description

ProAlign combines several different approaches in order to produce high quality word word alignments. Like competitive linking, ProAlign uses a constrained search to find high scoring alignments. Like EM-based methods, a probability model is used to rank possible alignments. The goal of this paper is to give a bird’s eye view of the ProAlign system to encourage discussion and comparison. 1 Alig...

متن کامل

Improving bilingual alignment models: Cognate identification, length dependence, and phrases

Determining exactly how words in a French sentence correspond to their counterparts in an English translation is an essential component of a machine translation system. For example, given the sentences Je suggère que tu arrives à l’heure and I suggest you arrive on time, we might hope to align je to I, suggère to suggest, and so forth. The IBM Models use pure distributional statistics to determ...

متن کامل

You'll Take the High Road and I'll Take the Low Road: Using a Third Language to Improve Bilingual Word Alignment

While language-independent sentence alignment programs typically achieve a recall in the 90 percent range, the same cannot be said about word alignment systems, where normal recall figures tend to fall somewhere between 20 and 40 percent, in the language-independent case. As words (and phrases) for various reasons are more interesting to align than sentences, we need methods to increase word al...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Word Alignment without NULL Words

نویسندگان

چکیده

منابع مشابه

Fast Collocation-Based Bayesian HMM Word Alignment

Improving IBM Word Alignment Model 1

ProAlign: Shared Task System Description

Improving bilingual alignment models: Cognate identification, length dependence, and phrases

You'll Take the High Road and I'll Take the Low Road: Using a Third Language to Improve Bilingual Word Alignment

عنوان ژورنال:

اشتراک گذاری